Build a Shareable Dashboard Using iPython Notebook and Matplotlib

In this notebook we're going to build a dashboard that we can then share with others.

What we're going to do is build our dashboard in iPython Notebook and then use the nbconvert tool from Jupyter (the new name for iPython Notebook) to convert our notebook into another format, specifically HTML. We can then share that HTML page with others.

Steps to Take

Step 1: Ensure you have the latest version of Jupyter

conda install jupyter

Step 2: Install the nbconvert utility

pip install nbconvert

Step 3: Build the dashboard - that will be all the code that follows

Step 4: Use the nbconvert utility to convert the notebook to an HTML file.

jupyter nbconvert --execute '2 - Build a Shareable Dashboard Using iPython Notebook and Matplotlib.ipynb'

The "execute" argument will run the notebook before exporting it. This allows us to automate the creation of the dashboard, and ensure that we're using the latest data in either the database or a data file.

If you'd like more information check out the following links:

* nbconvert: http://nbconvert.readthedocs.org/en/latest/index.html
* Jupyter: http://jupyter.readthedocs.org/en/latest/index.html

Dashboard

For this dashboard we will generate a number of charts from chapter 4, specifically the distribution analysis, categorical variable analysis and time-series analysis. While our dataset is static, you will want to point the dashboard to a file that is updated, or use a database query that pulls data for a specific time period such as the last seven days.

Accident Data Distribution Analysis


In [ ]:
# Import the Python libraries we need
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt

%matplotlib inline

In [ ]:
# Define a variable for the accidents data file
accidents_data_file = '/Users/robert.dempsey/Dropbox/Private/Art of Skill Hacking/Books/' \
                      'Python Business Intelligence Cookbook/Data/Stats19-Data1979-2004/Accidents7904.csv'

accidents = pd.read_csv(accidents_data_file,
                        sep=',',
                        header=0,
                        index_col=False,
                        parse_dates=True,
                        tupleize_cols=False,
                        error_bad_lines=False,
                        warn_bad_lines=True,
                        skip_blank_lines=True,
                        low_memory=False
                        )

Weather Conditions


In [ ]:
fig = plt.figure()
ax = fig.add_subplot(111)
ax.hist(accidents['Weather_Conditions'], 
        range = (accidents['Weather_Conditions'].min(),accidents['Weather_Conditions'].max()))
counts, bins, patches = ax.hist(accidents['Weather_Conditions'], facecolor='green', edgecolor='gray')
ax.set_xticks(bins)
plt.title('Weather Conditions Distribution')
plt.xlabel('Weather Condition')
plt.ylabel('Count of Weather Condition')
plt.show()

Light Conditions


In [ ]:
accidents.boxplot(column='Light_Conditions',
                  return_type='dict');

Light Conditions Grouped by Weather Conditions


In [ ]:
accidents.boxplot(column='Light_Conditions',
                  by = 'Weather_Conditions',
                  return_type='dict');

Categorical Variable Analysis

Distribution of Casualties by Day of the Week


In [ ]:
casualty_count = accidents.groupby('Day_of_Week').Number_of_Casualties.count()
casualty_probability = accidents.groupby('Day_of_Week').Number_of_Casualties.sum()/accidents.groupby('Day_of_Week').Number_of_Casualties.count()
fig = plt.figure(figsize=(8,4))
ax1 = fig.add_subplot(121)
ax1.set_xlabel('Day of Week')
ax1.set_ylabel('Casualty Count')
ax1.set_title("Casualties by Day of Week")
casualty_count.plot(kind='bar')

ax2 = fig.add_subplot(122)
casualty_probability.plot(kind = 'bar')
ax2.set_xlabel('Day of Week')
ax2.set_ylabel('Probability of Casualties')
ax2.set_title("Probability of Casualties by Day of Week")

Time-Series Analysis

Number of Casualties Over Time (Entire Dataset)


In [ ]:
# Create a dataframe containing the total number of casualties by date
casualty_count = accidents.groupby('Date').agg({'Number_of_Casualties': np.sum})

# Convert the index to a DateTimeIndex
casualty_count.index = pd.to_datetime(casualty_count.index)

# Sort the index so the plot looks correct
casualty_count.sort_index(inplace=True,
                          ascending=True)

# Plot the data
casualty_count.plot(figsize=(18, 4))

Number of Casualties in The Year 2000


In [ ]:
# Plot one year of the data
casualty_count['2000'].plot(figsize=(18, 4))

Number of Casualties in the 1980's (Bar Graph)


In [ ]:
# Plot the yearly total casualty count for each year in the 1980's
the1980s = casualty_count['1980-01-01':'1989-12-31'].groupby(casualty_count['1980-01-01':'1989-12-31'].index.year).sum()
the1980s

# Show the plot
the1980s.plot(kind='bar',
              figsize=(18, 4))

Number of Casualties in the 1980's (Line Graph)


In [ ]:
# Plot the 80's data as a line graph to better see the differences in years
the1980s.plot(figsize=(18, 4))

Share This Dashboard

To share this dashboard as an HTML file, run the following command in the directory where the notebook is located:

jupyter nbconvert --to html --execute '2 - Build a Shareable Dashboard Using iPython Notebook and Matplotlib.ipynb'

In [ ]: